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ABSTRACT 

We present in this paper a new Bayesian semi-blind approach for foreground removal in observations 
of the 21-cm signal with interferometers. The technique, which we call HIEMICA (HI Expectation- 
Maximization Independent Component Analysis), is an extension of the Independent Component 
Analysis (ICA) technique developed for two-dimensional (2D) CMB maps to three-dimensional (3D) 

21-cm cosmological signals measured by interferometers. This technique provides a fully Bayesian 
inference of power spectra and maps and separates the foregrounds from signal based on the diversity 
of their power spectra. Only relying on the statistical independence of the components, this approach 
can jointly estimate the 3D power spectrum of the 21-cm signal and, the 2D angular power spectrum 
and the frequency dependence of each foreground component, without any prior assumptions about 
foregrounds. This approach has been tested extensively by applying it to mock data from interfero¬ 
metric 21-cm intensity mapping observations under idealized assumptions of instrumental effects. We 
also discuss the impact when the noise properties are not known completely. As a first step toward 
solving the 21 cm power spectrum analysis problem we compare the semi-blind HIEMICA technique 
with the commonly used Principal Component Analysis (PCA). Under the same idealized circum¬ 
stances the proposed technique provides significantly improved recovery of the power spectrum. This 
technique can be applied straightforwardly to all 21-cm interferometric observations, including epoch 
of reionization measurements, and can be extended to single-dish observations as well. 

Subject headings: cosmology:observations, 21-cm, foreground, instrumentatiominterferometers, meth¬ 
ods: data analysis, statistical 


1. INTRODUCTION 

One of the main goals of modern cosmology is to 
measure the large-scale structure (LSS) of the universe, 
which encodes crucial information about cosmological 
processes. Measuring these large-scale fluctuations is 
the primary tool for understanding cosmological and as- 
trophysical phenomena and determining related param¬ 
eters. Observations (Planck Collaboration et al. 2014b; 
Hinshaw et al. 2013; Smoot et al. 1992) of the cosmic 
microwave background (CMB) precisely probe fluctua¬ 
tions induced by primordial matter density perturbations 
from the last scattering surface at a redshift of z ^ 1100 
to constrain models of inflation (Peiris et al. 2003; Kin¬ 
ney et al. 2006). Spectroscopic galaxy redshift surveys 
such as the 2dF Galaxy Redshift Survey (Colless et al. 
2003), WiggleZ Dark Energy Survey (Drinkwater et al. 
2010), the Sloan Digital Sky Survey (York et al. 2000), 
the LAMOST (Wang et al. 2009) and Baryon Oscillation 
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Spectroscopic Survey (Anderson et al. 2013) have been 
attempting to map out the three-dimensional (3D) struc¬ 
ture of galaxies in the universe, providing an even more 
precise measurement of cosmological parameters. 

Neutral hydrogen (HI) tomography may provide a 
powerful alternative to galaxy surveys. The redshifted 
21-cm hyperfine transition line allows a direct measure¬ 
ment of the redshift (distance) of a source. Using HI to¬ 
mography through the redshifted 21-cm emissions could 
be a relatively inexpensive tool to probe LSS over enor¬ 
mous volumes of our universe, from redshift 0 to ulti¬ 
mately as high as ~ 200, potentially collecting orders 
of magnitude more information than either the CMB or 
galaxy surveys can provide. Cosmological parameters 
could be determined with far greater precision (Mao et al. 
2008), and the properties of dark matter, dark energy, 
neutrinos (Pritchard & Loeb 2012; Morales & Wyithe 
2010; Loeb & Wyithe 2008), and non-Gaussianity (Mal- 
dacena 2003; Loeb & Zaldarriaga 2004; Xu et al. 2015) 
from inflation could be tightly constrained. 21-cm emis- 
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sion may be the only way to explore the dark ages and 
the epoch of reionization (Madau et al. 1997; Chen & 
Miralda-Escude 2004; McQuinn et al. 2006; Furlanetto 
et al. 2006). 

In particular, it is possible to trace the dynamics of 
the expansion of the universe and study the evolution of 
the dark energy by mapping the large scale structure of 
the universe in three dimensions (Seo & Eisenstein 2003). 
Several authors (Morales & Wyithe 2010; Chang et al. 
2008; Wyithe et al. 2008; Loeb & Wyithe 2008; Peter¬ 
son et al. 2006) have proposed a new technique, the so- 
called “21-cm intensity mapping”, for the observation of 
the baryon acoustic oscillations (BAO) in the large scale 
structure. By mapping the 21-cm intensity field with 
an angular resolution of ^ 10 arcmin across the redshift 
range from 0 up to 3, it may be possible to determine pre¬ 
cisely the equation of state of the dark energy (Bull et al. 
2015). The technique has been tested with existing tele¬ 
scopes such as the Green Bank Telescope (GBT) (Chang 
et al. 2010; Masui et al. 2013; Switzer et al. 2013) and 
the Parkes Radio Telescope (Pen et al. 2009). However, 
to obtain the sensitivity and angular resolution neces¬ 
sary to resolve the BAO peaks demands dedicated ra¬ 
dio telescopes with large numbers of receivers and aper¬ 
tures of order 100 m. Dedicated single-aperture tele¬ 
scopes (e.g. FAST (Nan et al. 2011)) are prohibitively 
expensive. Instead, interferometers provide an economi¬ 
cal way to observe large volumes of the sky and measure 
the BAO features efficiently. Additionally, interferome¬ 
try is well-suited to the EoR and BAO measurements: 
visibilities naturally measure Fourier modes of the in¬ 
tensity distribution on sky, which is directly related to 
the power spectrum of statistical fluctuations in the HI 
distribution. Radio interferometric arrays are already 
being designed and constructed. Currently, there are 
several dedicated HI interferometers that plan to detect 
the redshifted 21-cm EoR signal: LOFAR 1 , MWA 2 , 21- 
cmA 3 and PAPER 4 . GMRT 5 is also currently under way 
to detect the 21 cm signal from reionization. Further¬ 
more, several 21-cm BAO experiments have been pro¬ 
posed (including BAOBAB (Pober et al. 2013a), BAO- 
Radio (Ansari et al. 2012b)) or constructed (CHIME 6 , 
and TianLai (Chen 2012)) to conduct low redshift sur¬ 
veys. Next-generation experiments such as SKA 7 will 
enable revolutionary progress in 21-cm tomography. 

However, HI tomography is extremely challenging 
because astrophysical foreground contamination is ex¬ 
pected to be at least four orders of magnitude brighter 
than the 21-cm signal. The success of statistical detec¬ 
tion of the signal relies on robust foreground removal, 
which has been extensively explored over the past decade. 
Different methods have different advantages and biases 
in providing signal estimates. Fortunately, foregrounds 
are expected to have smooth continuum spectra that 
vary slowly with frequency, while the 21-cm signal is ex¬ 
pected to oscillate dramatically over a short frequency 
range and become uncorrelated beyond frequency sepa¬ 

1 http://www.lofar.org 

2 http://www.mwatelescope.org 

3 http://21-cma.bao.ac.cn 

4 http://astro.berkeley.edu/ dbacker/eor/ 

5 http://gmrt.ncra.tifr.res.in 

6 http://chime.phas.ubc.ca 

7 http://www.skatelescope.org 


rations of about 1 MHz (Santos et al. 2005). In prin¬ 
ciple, one can remove slowly varying components to ex¬ 
tract the HI signal from the contaminants although this 
process will always leave some residual contamination. 
The early ideas use a multi-frequency angular cross¬ 
correlation power spectrum to separate foregrounds from 
the 21-cm signal (Di Matteo et al. 2002, 2004; Oh & 
Mack 2003; Zaldarriaga et al. 2004; Santos et al. 2005; 
Cooray & Furlanetto 2004). An alternative approach is 
to subtract foregrounds along each line of sight in the 
sky or in the m;-plane, by fitting the frequency depen¬ 
dence with a particular smooth function (e.g. a poly¬ 
nomial), and then using the residuals to determine the 
21-cm power spectrum (Santos et al. 2005; Wang et al. 
2006; McQuinn et al. 2006; Bowman et al. 2006; Jelic 
et al. 2008; Gleser et al. 2008; Liu et al. 2009a; Harker 
et al. 2009a; Petrovic & Oh 2011; Bowman et al. 2009; 
Harker et al. 2010). Higher-order statistics, like skew¬ 
ness, are also employed to clean foregrounds (Harker 
et al. 2009b). In addition, Liu et al. (2009b); Trott et al. 
(2012); Morales et al. (2006) examined the impact of 
point-source subtraction residuals on 21-cm fluctuations, 
and Morales & Hewitt (2004) found a removal approach 
based on some symmetries in the 3D Fourier represen¬ 
tation of radio interferometric data. More recently, sim¬ 
ulations and observations have shown that even if sky 
foregrounds are spectrally smooth, the chromatic synthe¬ 
sized beam of an interferometer interacting with smooth 
spectrum foregrounds can still imprint unsmooth spec¬ 
tral features on measured foregrounds, contaminating a 
“wedge”-like region (so-called “mode-mixing”) in cylin¬ 
drical (fcj_,fc||) Fourier space (Datta et al. 2010; Vedan- 
tham et al. 2012; Morales et al. 2012; Parsons et al. 2012; 
Trott et al. 2012; Pober et al. 2013a,b; Dillon et al. 2013; 
Liu et al. 2014a,b; Pober et al. 2014). 

In order to subtract foregrounds with minimal prior as¬ 
sumptions on the angular distribution or frequency de¬ 
pendence of foreground components, some blind (non- 
parametric) methods have been extensively studied in 
the literature. By using only the data, those methods 
can automatically find the correlated features and sep¬ 
arate them out into distinct components. A commonly 
used data analysis technique known as Principal Com¬ 
ponent Analysis (PCA) has been applied to foreground 
removal and successfully tested on simulated data and 
real observations (de Oliveira-Costa et al. 2008; Chang 
et al. 2010; Masui et al. 2013; Switzer et al. 2013). Other 
non-parametric methods, including the FASTICA algo¬ 
rithm (Hyvarinen 1999; Hyvarinen & Oja 2000; Chap¬ 
man et al. 2012; Wolz et al. 2014) based on indepen¬ 
dent component analysis and singular value decompo¬ 
sition (SVD) together with an analysis based on the 
Karhunen-Loeve transform (Shaw et al. 2014), provide 
encouraging results. The FASTICA algorithm is based 
on the maximization of non-Gaussianity (neg-entropy), 
a measure of the deviation of a mixture of signals from a 
Gaussian distribution. However, the disadvantage of this 
approach is that the noise leakage into the reconstructed 
signal is in principle hard to estimate since the signal 
and noise essentially have the same statistical distribu¬ 
tion, which is completely uncorrelated across frequencies. 
Other methods such as the Correlated Component Anal¬ 
ysis (CCA) (Bonaldi & Brown 2015) can be referred to as 
“semi-blind” as they require some prior knowledge. By 
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using second-order statistics CCA also provides a promis¬ 
ing way to separate the 21-cm reionization signal from 
foreground contamination for SKA measurements. Re¬ 
cently Vansyngel et al. (2014) have provided a unified 
approach for semi-blind foreground cleaning from multi¬ 
frequency CMB data. 

In this study, we describe HIEMICA, the HI 
Expectation-Maximization Independent Component 
Analysis algorithm, a Bayesian framework to infer power 
spectra and maps of the HI signal and foregrounds 
from “dirty” data cubes. This technique extends the 
Spectral Matching ICA (SMICA) (Cardoso et al. 2008; 
Delabrouille et al. 2003; Planck Collaboration et al. 
2014a) approach from two-dimensional (2D) CMB maps 
to the three-dimensional (3D) 21-cm cosmological signal 
and can be considered a semi-blind source separation 
method. The statistical isotropy of the cosmological 
Gaussian random fields in 3D Fourier space provides a 
characteristic and unique signature to distinguish them 
from the statistically non-isotropic foregrounds; in the 
frequency (redshift, or radial) direction, astrophysical 
foreground sources contain only spectral information, 
which has no connection with their spatial distributions 
in the transverse direction. 

The purpose of this work is to take a first step toward 
building a new framework for 21cm data analysis. Our 
paper provides a complete description of the mathemat¬ 
ical formalism and numerical techniques, which in prin¬ 
ciple can be applied to any realistic array design. Here 
we test the approach for an idealized array; we propose 
to dedicate a future paper to systematic tests of arrays 
with realistic instrumental effects. 

This paper is organized as follows: Sect. 2 describes the 
HIEMICA semi-blind component separation algorithm; 
Sect. 3 summarizes details of our performed simulations; 
Sect. 4 presents the application of HIEMICA to simu¬ 
lated sky maps and shows the main results. Finally, 
Sect. 5 provides discussion and concluding remarks. 

2. THE HIEMICA METHOD 
2.1. Radio interferometric measurements 

For a multichannel interferometric array, the funda¬ 
mental observable is the visibility , U(u, z/), which is the 
angular Fourier mode of the intensity fluctuations on the 
sky measured by a baseline u in a frequency channel v. 
In the flat-sky approximation, the complex visibility can 
be written simply as the 2D Fourier transform of the 
beam-modulated sky intensity: 


the intensity distribution T(0,z/) and associated power 
spectrum using FFT-based algorithms. If we assume n ± 
pixels per 2D image (real or Fourier space) with even 
spacings in the 0 X - and ^-directions (i.e. two transverse 
directions), and an instrument with frequency chan¬ 
nels of an equal spectral resolution An to probe the sky 
signal in a radial direction, then we have a 3D uniform 
Cartesian grid of size n ± ri\\. Throughtout the paper, the 
symbols _L and || denote the two transverse directions 
and the radial direction, respectively. 

Following Sutter et al. (2012), the radio interferomet¬ 
ric measurement equation of Eq. 1 at the frequency zq, 
for i = 1, 2, • • • , ri||, can be recast in a discrete operator 
formalism by 

y±N = V l Vi]F ± K[v i ]F~ 1 'ip 1 _\y i ] + nJi/*], (2) 


where y ± [ui] =_(y[ui, Vi\, ■ ■ ■ ,y[u nj _,Vi}) T and /*] = 
(^[rzi, zq], • •« ,^[w ni Tj) T denote, respectively, the dis¬ 
cretized visibilities in the uv- plane and the 2D Fourier 
“image” of the true angular sky a]) of 21-cm 

and foregrounds at the i-th channel. The vector 
ri x [zq] = (n[ui, zq], • • • ,n[u n± ,ni\) T denotes the instru¬ 
mental noise at the i-th channel for all the uv points. 
Hereafter, bold, upper and lower case letters denote a 
matrix and a vector, respectively, and the symbol 
is used for Fourier components. F x represents the 2D 
Fourier transform operator performed in the transverse 
direction, converting from the real-space domain into the 
mz-space domain (of course, F” 1 is its inverse), and A[zq] 
represents the primary beam pattern of the interferom¬ 
eter measured at the frequency zq and implicitly is a 
n ± x n ± diagonal matrix. The operator I[zq] specifies 
the visibility sampling function determined by the dis¬ 
tribution of baselines and can also be represented by a 
n± x n ± diagonal matrix that has l’s and 0’s on the di¬ 
agonal to indicate whether a given pixel in rzu-space has 
been observed or not. 

It is convenient to introduce a vector fornq for nu¬ 
merical operations. By stacking each vector ^ x [zq] as 

r ~ i T 


4 >±= €N,-‘,€k] 


, Eq. 2 can account for all 


the frequency channels when rewritten as 
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0 ) 
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1° 

*"ll/ 

with = I|/ / ; F A //; F 1 , 




(3) 


V(u,v) = Jd 2 6A(0,is)T(0,v)e- 2niue , (1) 

where 6 denotes the angular sky position, T(0, v) is the 
sky temperature, A(Q,v) is the primary beam pattern, 
which determines the observed area of the sky, and the 
baseline vector u is on the uv- plane perpendicular to the 
direction of the incoming signals, and measures the sep¬ 
aration between the two antenna pairs in units of wave¬ 
length, i.e., u = v/c Ar. As the physical separation Ar 
of each pair of antennas is assumed to be fixed, each fre¬ 
quency channel thus gives an independent set of visibil¬ 
ity samplings in the mz-plane with frequency-dependent 
baseline vectors. 

In practice, especially for 21-cm tomography, one dis¬ 
cretizes the 3D sky into voxels so as to efficiently estimate 


where is a vector with length and $ is a block 

diagonal matrix, while y = [y x [^i],-‘* ,yIVn„]] T , and 

n = [rrf [zq], • • * , n^[z/ n J] T , each collect vectors for all 
frequencies into an n x n y -element vector. 

Notice that Eq. 3 is presented on a uniform grid in 
order to enable the FFT operation. Realistic observa¬ 
tions from radio interferometers do not sample visibility 
data on a uniform grid in rzu-space. Therefore, analyzing 
the data on a uniform grid requires an additional pro¬ 
cess, “ gridding ”, to map raw visibilities from specified uv 
points to grids with even spacing. However, the gridding 
process can distort the signal estimation, especially for 
small 21-cm signals. Alternatively, one can use the Non 
Uniform Discrete Fourier Transform (NuDFT) (Fessler 
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& Sutton 2003) to directly calculate visibility data at 
continuous coordinates so as to avoid “gridding” effects. 
We leave an investigation of the gridding effects and a 
possible solution by NuDFT to future work. 

2 . 2 . Linear mixture model 

The 21-cm signal along the line-of-sight for each indi¬ 
vidual pixel is not a smooth function of frequency, while 
most astrophysical contaminants should have smooth 
power-law spectral structures. If we assume that the 
total foreground emission contributing to the observed 
data in a given frequency channel can be expressed as a 
linear superposition of a few components and the spectral 
behavior of each component does not vary across the sky, 
the true sky temperature in pixel 0 *,, for k = 1 , 2 , • • • ,n l5 
in the i-th channel then can be modelled as 

n c 

ipWi,0k] = s[vi,0 k \ + (4) 

3 = 1 

where Vi is the frequency, Ok is the pixel position, n c 
the assumed total number of foreground components, 
s[vi,6k\ the 21 -cm signal and fj[Ok] the j- th foreground 
component. reflects the contribution of the j -th 

foreground component to the i-th frequency channel and 
is assumed to be constant and independent of and 
thus the matrix M is commonly called the “mixing ma¬ 
trix?' with dimensions of x n c . Each column of M en¬ 
codes the frequency-dependence of each foreground com¬ 
ponent. 

As the Fourier modes of the isotropic 21 -cm signal 
s(z/, 0) are expected to be mutually independent, express¬ 
ing its temperature in the Fourier coordinates ( 77 , u) is 
particularly useful. By taking the 3D Fourier transform 
along the frequency axis and two angular directions, one 
has 

3 ( 1 1, u) = J s{y, 0) e" 2 ™ (^+ u - 0 ) dddu , (5) 

where r] is the Fourier dual of the frequency variable v. 
Note that here we perform the “true” Fourier transform 
of the sky temperature along the line-of-sight direction 
that is perpendicular to the uv- plane, rather than a u de¬ 
lay transform ”. The delay transform is the Fourier trans¬ 
form in frequency of the spectrum measured by a single 
baseline, which varies with frequency (i.e., u is a func¬ 
tion of v in Eq. 5) and which leads to the “mode-mixing” 
phenomenon (see Liu et al. (2014a) for details). 

The foreground components can also be expressed in 
terms of their 2D Fourier modes. For the j- th compo¬ 
nent, one has 

fj{ u ) = Jm e -2 ™" 0 do (j = 1 , • • • , n c ). ( 6 ) 

According to the above equations, by performing the 
angular Fourier transform on Eq. 4 and expressing the 
21-cm signal in terms of its 3D Fourier modes, Eq. 4 can 
be rewritten in a discretized version as 

n \\ n c 

i’Wi, Uk\ = F F »A-| + y , (7) 

m= 1 j =1 

where i denotes the frequency channel number, k de¬ 
notes pixel position in the uv- plane and we have intro¬ 
duced the operator (matrix) F^ 1 , with its elements given 


by Fourier coefficients ra] (i = 1, • • * f rijfra = 

1, • • • , Tin), to represent the 1-D inverse Fourier transform 
converting the ^-domain to the frequency domain. 

Using matrix multiplications, we can rewrite the linear 
system above as the matrix equation: 

# = F“ 1 S + MF, ( 8 ) 

where $[i'i,Uk\ is the (i,fc)-th entry of the matrix Tq 
s[rj m , Uk] is the (m, k)- th entry of the matrix S, and fj [uk\ 
the (j, k )-th entry of the matrix F. Applying a “vector- 
ization” operator (vec[-]) that converts the matrix into a 
column vector by stacking the columns into a long col¬ 
umn vector, one can express in Eq. 3 with \j> T by 

ij> ± = vec[T rT ] = vec[(F“ 1 S) T ] + vec[(MF) T ] . (9) 

Substituting Eq. 9 into 3, and using the identity 
vec [PQ] = (Q T ® Im) vec[P] for any matrices P mxn and 
Q nxp , where ( 8 ) denotes the Kronecker product and I m 
is an (m x m) identity matrix, yields 

<K = (B®I n Jx, B=(F- 1 M), (10) 

where x = (s T , f T ) T with s = vec[S T ], f = vec[F T ], 
I n± is the n_i_ x n ± identity matrix, and B is a par¬ 
titioned matrix with dimensions of x (n y + n c ). 
Explicitly, s and f are vectors of length n ± n^ and 
n ± n c , respectively. They are constructed by stacking 

S [rim] and f ) as s = (s T [r?i],-- - ,s T [r? ra J) and f = 

p T , ■ ■ • ,fj c ) j where the vectors s[//,,,] and f ), each 

with length %, denote an image of the 3D Fourier 
modes of the 21 -cm signal at 7 y m and an image of the 
2D Fourier modes of the j -th foreground component, re- 
spectively, i.e., s [rf m ] = (s[«i, rf m ], • • • ,§[u n± ,r] m ]) and 

t = (/jNc" ,/jKil)' 

By inserting Eq. 10 into Eq. 3, the measurement equa¬ 
tion finally can be expressed as 

y = Hx + n , H = $(B 0 l ni ). (11) 

2.3. ICA assumption 

The Independent Component Analysis (ICA) assump¬ 
tion is that the data can be considered as a linear mix¬ 
ture of a set of statistically mutually independent com¬ 
ponents. The cosmological 21-cm signal is expected to 
be well approximated by an isotropic Gaussian random 
field and uncorrelated with foregrounds. If we also as¬ 
sume that the diffuse foregrounds consist of several sta¬ 
tistically Independent Components (ICs), each of them 
also being an isotropic Gaussian random field with zero 
mean, as defined in Eq. 10, the covariance matrix of x 
then becomes a diagonal matrix, namely 



The unknown diagonal matrices (ss^) and are to 

be estimated from the observed data. They determine 
the 3D power spectrum of the HI signal and the angular 
power spectra of the ICs, defined through 

(s\UkiTlm\s fak' 5 Vm']) = Vm)^kk'^mm' ( 13 ) 

(/j( u fc)/j'( M fc')) =Sjj'6 k k'C° f (£ = 27r|u|) (14) 
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where PmiNk^Pm) denotes the 3D power spectrum of 
the 21-cm signal as a function of u and ij, and Cj(£) 
denotes the angular power spectrum of the j-th fore¬ 
ground component as a function of multipole £. The 
relation £ = 27r|u|, where u is determined by the pixel 
position Uh, has been used in the flat-sky approxima¬ 
tion. Notice that physical sources such as the Galactic 
synchrotron and free-free emissions actually do have a 
non-zero cross-correlation and their spatial distributions 
clearly appear to be non-isotropic. Nevertheless, as we 
will demonstrate, the ICA assumption can be regarded 
as an effective decomposition of sources and does not 
appear to affect our ability to remove foregrounds. 


2.3.1. Noise 


We also assume that the instrument noise is an uncor¬ 
related Gaussian distribution and can be obtained from 
a reasonable noise model through 

(' h(u k ,p i )h*(u k >,i' i ')}=P N (u k ,Pi)6 k k'5u> ■ (15) 


The covariance matrix thus has the form of a known 
block diagonal matrix (with n defined in Eq. 3), where 
each block is also diagonal, 


N = (mT) 


/Ni 

u 


0\ 

N «ll/ 


(16) 


Here denotes the n ± x n ± covariance matrix of the 
noise map in the uv -plane at the i-th frequency channel, 
which is also diagonal. 


2.4. The EM algorithm 

The goal of our analysis is to identify and separate the 
components from visibility data that contains mixtures 
of foregrounds and signal. Using the Bayesian frame¬ 
work proposed by Snoussi et al. (2002) for CMB data, 
we will show that, without any assumption on priors, by 
performing a semi-blind independent component analy¬ 
sis for which only the noise covariance matrix N is well 
known, one can successfully separate the components by 
jointly estimating the covariance matrix C and the mix¬ 
ing matrix M, and can accurately estimate the power 
spectrum of the HI signal. 

In Bayesian inference, information about unknown pa¬ 
rameters 0 that we want to estimate from the data y is 
expressed in the form of a posterior probability distri¬ 
bution. Using Bayes’ theorem, it can be computed as 
p(0 |y) oc C(y\0)p(0) where C(y\Q) is the likelihood and 
p(0) is the prior distribution of 0; 0 = (C,M) in our 
case. If we assume flat priors for the mixing matrix and 
the power spectra of all sources, and assume they are 
uncorrelated and independent, the prior then reduces to 
P(0) = P(C)P(M) oc 1. Thus, exploring the observed- 
data posterior p(y\0) is equivalent to exploring the like¬ 
lihood. Following Snoussi et al. (2002), given the data 
model in Eq. 11, y = Hx + n, the mixing matrix M and 
the covariance matrix C can be estimated by maximizing 
the likelihood function. For independent and Gaussian 
sources, the likelihood can be expressed as 

—21n£(y|0)ocln|HCH t +N| (17) 

+Tr [(HCH t + N) -1 yy t ] . 


Unfortunately, in general, such a likelihood evalua¬ 
tion is computationally intractable when applied to 
large datasets with realistic interferometric observa¬ 
tions. However, in the analogous case of CMB imag¬ 
ing observations’*’, this likelihood can be approximated 
by the SMICA-likelihood (Delabrouille et al. 2003) to 
measure a “spectral mismatch” in the Fourier domain 
between the empirical covariance matrices of the data 
and their ensemble averages, which depend on the esti¬ 
mated parameters. In practice, minimization of the spec¬ 
tral mismatch is equivalent to maximizing the likelihood 
and can be achieved with the Expectation-Maximization 
(EM) algorithm. The EM algorithm has been applied to 
estimate the CMB power spectrum and reconstruct the 
CMB map from multi-frequency microwave maps (De¬ 
labrouille et al. 2003). The EM algorithm is an elabo¬ 
rate technique to find the maximum-likelihood estimate 
of the parameters when directly evaluating the likelihood 
function is intractable. The calculation is simplified by 
assuming the existence of additional but missing param¬ 
eters. It is an iterative algorithm to repeatedly solve a 
tractable complete-data problem instead of solving a dif¬ 
ficult incomplete-data problem. 

Here we briefly summarize the EM algorithm when ap¬ 
plied to our model. The likelihood p(y\0) first can be ob¬ 
tained by marginalizing the joint distribution p(y, x|0) 
over the missing data x as 

p(y\0) = Jp(y,x\d)dx. (18) 

The key idea is that the EM algorithm does not maxi¬ 
mize p(y\0) directly; instead, it maximizes the so-called 
EM-functional as follows: 

Q(d\6 n ) =E [lnp(y,x|0)|y, 0 ra ] 

= / lnp(y,x|0)p(x|y,0 n )dx, (19) 

which is the expected value of the complete-data log- 
likelihood with respect to the missing data x given the 
observed data y and the current parameter estimates 0 n . 
The evaluation of this expectation is called the Expecta¬ 
tion (E)-step of the algorithm. 

In our model, the prior distribution of the complex 
Fourier modes x is assumed to be Gaussian with zero 
mean, 

p(x|C) oc exp ^-ix t C _1 x^ , (20) 

where the diagonal covariance matrix C is defined in 
Eq. 12. Within a Bayesian framework, the joint posterior 
distribution p(y,x|0) can be expressed by p(y,x|0) = 
p(y|x, 0)p(x|0), yielding 

-2 lnp(y, x|0) = In |N| + (y - Hx) t N“ 1 (y - Hx) 

+ ln |C| + x^C _1 x + cst. (21) 

Using Bayes’ rule, the conditional probability distribu¬ 
tion function p(x|y, 0 n ) for the signal given the data is 
also the Gaussian, 

P(x|y,0”) ocexp i(x-x) t S“ 1 (x-x)^ , (22) 

^ The convolution with a frequency-dependent beam response in 
real space becomes a simple product in Fourier space. 
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where 


where 


X = [H t N _1 H + C _1 ] _1 H t N _1 y, (23) 

S=(H t r 1 H + C“ 1 )“ 1 , (24) 


where x is the so-called Wiener-Filtered (WF) 
map and X) is the corresponding covariance T, = 
((x — x)(x — x)’*’). The solution for x is the general map¬ 
making problem in cosmology. This WF map can be 
computed efficiently by implementing a preconditioned 
conjugate-gradient method that allows one to iteratively 
reach the solution in a tractable amount of computation 
time. 

Using the above Eqs. 21 and 22, we integrate out x to 
derive the expression of Eq. 19 given by 


Q(G\O n ) = cst. - i In |N| - \ In |C| - ^Tr 


c _1 rL 


(25) 




N' 


"C 


R yy - HRt x - R yx Ht + HR xx H t 


where 


Ryy = yy f , Ryx = yx f , Rxx = s + XX* . (26) 

Notice here R yy , R yx and R xx only depend on 0 n rather 
than 0 . 

The second step, called the Maximization (M)-step , 
updates the parameters by maximizing the expectation 
we computed in the previous E-step. These two steps 
are repeated as necessary. This procedure is guaranteed 
to increase the likelihood p(y\0) monotonically with suc¬ 
cessive iterations. In order to obtain the parameter 0 n+1 
at iteration n + 1, we solve the gradient equation with 
respect to M and C to maximize the functional Q. To 
do so, let us first introduce some notation that allows us 
to express the derivative simply. 

Let x and T, be partitioned as follows: 

x=(! t ,F) T (27) 

with 

S= (slfoi].-" ,il [Vn l{ ]) ,f = (fjT" 4 j c ) -( 28 ) 

Here x is equally divided into (riy + n c ) subvectors, each 
with length %, s±[rn] denotes the WF map of the 21- 
cm signal at rp, and f ) denotes the WF map of the j-th 
foreground component. Similarly 


Sf| = ((%]-I[ ?7i ])(ste]-l[ ??i ])t}, 
Sf/=((sM-I[ ?7i ])(%]-f[r ? ,])t), 

s f/= {(f[vi\ - hvi])(ffo} - , 


each with dimensions of n ± x n ± . 

The gradient equation then reads: 

= -(C- 1 ) + C^RxxCT 1 = 0 


dQ 

dMij 


= Tr 


N- 1 + (*t,yh) 


ait [N” 1 HR xx Ht)] 


dMi 


= 0 


(31) 

(32) 

(33) 

(34) 


(35) 


where the derivative of the last term can be well approx¬ 
imated by 


aTr[N -1 HR xx Ht)] 

dM~i 


(36) 


n c ^ ^ ^ ^ 

E M ik Tr [n- 1 ^, (e# + ffcfj + (S" + fjfcftjt) * 


k=1 


where = I[i/i]F ± A[i/i]F ± 1 as defined in Eq. 2, and we 

have neglected the terms that include and 5U S , al¬ 
though the complete expression can be obtained straight¬ 
forwardly. This is an excellent approximation for 21-cm 
foregrounds since they are several orders of magnitude 
brighter than the 21-cm signal’*’. By making this ap¬ 
proximation, we are able to dramatically speed up the 
computation. According to Eqs. 34, 35 and 37, one can 
establish the update scheme as follows: 






(Rx 


P f ?'[ 1,1] •• 

Wv, i] • • 


• pf 7 [l,n=] ^ 

■ p f i f [n c ,n c }J 


(37) 

(38) 


where denotes the i-th row of the mixing matrix, 
and 


Tr 


p(h k J]= Re { 

p{ v \j] = Re {Tr [n, 




!]} 


} (39) 
(40) 


E = 




e 5 A 

'E-ff) 


(29) 


with 

/ Eft • 

‘ ‘ 2 j ln± 

eA = 

IAt 

* jLj n±n± 


/ yLf .. 

. y»// \ 

L ln c 


. y»// 

\ L n c l 

. y/7 j 

n c n c / 


( 

Y» 5 / \ 

i y s 7 . 

Wi 

• • H s f j 

n±n c / 


(30) 


One can verify that, by setting <I>^ as an identity matrix, 
for i = 1, 2, • • • , |, such that no instrumental effects are 

present, this is in complete agreement with the structure 
of the expression for the mixing matrix in the literature 
(Snoussi et al. 2002; Delabrouille et al. 2003). 

The problem is evaluating the matrix XI from its in¬ 
verse as given by Eq. 24 in order to obtain the covari¬ 
ance matrix C from Eq. 38. Since the matrix X) has 
the dimensions of n_ L (n ll + n c ) x n_ L (n ll + n c ), which is 
typically of order 10 6 x 10 6 in a 21-cm survey, such ma¬ 
trix inversion is completely prohibitive computationally. 

t Notice that, in the case of CMB, the evaluation of the mixing 
matrix not only depends on CMB foregrounds but also on the CMB 
signal itself since they are roughly comparable. 
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However, as the power spectrum is only determined by 
the diagonal components of X), not by the off-diagonals, 
by using the same trick presented in the Gibbs sampling 
techniques (Wandelt et al. 2004; Jewell et al. 2004) for 
CMB data analysis, one can verify that the ensemble- 
averaged solution for the following linear equation has 
the desired covariance. By solving 

[H+N^H + C -1 ]z = H t N _1 / 2 zi + C“ 1/2 z 2 (41) 

where the real vectors zi and z 2 are of length and 

n ± (n c + ri||), respectively, with elements drawn from a 
standard normal distribution, one can find that T, = 
(zz - *") where (•) represents the average over all solu¬ 
tions for a large number of realizations of zi and z 2 . 
This algorithm can rapidly provide accurate approxi¬ 
mate solutions in massively parallel computers. There¬ 
fore one can obtain the diagonal components of R xx in 
Eq. 38 efficiently by (R xx )n = (|^| 2 ) + \xi\ 2 where i 
runs over all elements. In the same way as in x, the 
vector z with length (riy + n c )n ± can also be equally 
split into ( 77 -n + n c ) subvectors, each with length %, as 

z = (( z i) T ,--' ,( z n||) T -( z i) T >-- - ,( z n'j T ) , in corre- 
spondence with contributions from the 21-cm signal, de¬ 
noted by zf for i = 1, • • • ,%, and the ICs, denoted by z{ 
for i = 1, • • • , n c . Note that, in general the noise could be 
correlated, resulting in non-zero off-diagonal components 
in the noise covariance matrix, N. Solving Eq. 41 will 
require multiplication by a dense n ± n\\ x n ± ri\\ inverse 
noise covariance matrix, with a computational scaling of 
(9(n 2 n 2 ). In the most general case, inverting N is a sig¬ 
nificant computational challenge. However, the inverse 
noise covariance matrix needs only to be computed once 
and stored, and then can be used many times to compute 
Eqs. 39, 40 and 41. 

Using the above mentioned trick, the trace term in 
Eq. 39 can be computed merely by the ensemble averaged 
dot products of two vectors as 

T*[Ny *;(£{/+ fJj)*V = 

((^*zf) t (N“ 1 $jz{)) + , (42) 

where the vectors zj and z{ represent the subvectors of 
z solved by Eq. 41. The above equation applies also to 
the case of correlated noise. 

Moreover, in each EM iteration, we fix the mag¬ 
nitude of each column of M to unity and adjust 
the corresponding power spectra of the ICs accord¬ 
ingly, similar to Delabrouille et al. (2003), to break 
the degeneracy and keep the scale-invariant product 

(7K]/j*K]) = Ej MijMvjCfakl un- 

changed for an arbitrary pixel u & at any frequency chan¬ 
nels i and i', as seen from Eq. 7. 

After updating M and C, since the quantities, 
for the ICs and k^k^Pm for the HI signal are expected 
to vary more slowly than C\ and Phi themselves, it is 
more appropriate to perform bin averages after each it¬ 
eration and update the associated elements in C, while 
ensuring the number of observations be smaller than the 
number of the estimated parameters in C and M. The 
simulation shows that using band-averaged spectra can 


significantly increase the convergence speed and obtain 
much more stable spectra with high accuracy in itera¬ 
tions. According to Eq 13, the bin-averages then yield 
the following estimates: 

C/N ( Y2 i u n E i u i 2<:7 /( u ) c 43 ) 

\|u|e£q / \u\eD q 

-Pm(u,r?)^( r ?l u l 2 'j E v\^\ 2 Pm(n,ri) (44) 

fu| eD q |u|e£q 

rieD z r]eD z 

where D q is the set of u values contributing to bin q 
and D z is the set of r] values contributing to bin z. Al¬ 
ternatively, if the 21 -cm signal is expected to be highly 
isotropic, it is also appropriate to perform bin averages 
by averaging over spherical shells of constant \k\ to obtain 
the spherically averaged HI power spectrum. The bin¬ 
averaging schemes mentioned above are especially nec¬ 
essary at the beginning of iterations with a poor initial 
guess since bin averaging can smooth out any unreason¬ 
able values and highly suppress foreground contamina¬ 
tion that is expected to significantly increase in strength 
towards low k {l and would result in a substantial overes¬ 
timate of HI power spectrum. One has to keep using the 
bin averaging process until the derived amplitude of the 
HI signal in each bin converges to a reasonable range. 

In addition, although the bin-widths can be chosen ar¬ 
bitrarily, an appropriate choice can reduce the correlation 
between the band-power estimates while keeping accu¬ 
rate detection on the structure of the power spectrum. 
The minimum bin-width can be approximated by the 
typical correlation length in Fourier space. For an inter¬ 
ferometer, one may choose a bin-width of perpendicular 
modes, Au, greater than the characteristic width of the 
Fourier transformed primary beam pattern A(x), and a 
bin-width of parallel modes, A 77 , roughly greater than 
the inverse of the bandwidth. 

The EM algorithm iterates until there is no significant 
change in the likelihood. The stopping criterion of the it¬ 
erative process is here set in terms of the relative change 
of the bin-averaged HI power spectrum in the last itera¬ 
tions, i.e., |Phi + — Pm\/\Pm\ < 10 -3 for all bins, and a 
typical number of iterations is a few hundred. 

2.5. The initial guess 

Since the EM algorithm is a hill-climbing approach, the 
searching procedure may fail to reach a global maximum 
and instead converge on a local maximum if the likeli¬ 
hood functions are not convex. In this study, we adapt 
a commonly used strategy to solve this problem. We try 
many different initial values varied in reasonable ranges 
and choose the solution that has the highest converged 
likelihood value. After a careful investigation, we chose 
the following algorithm for initialization. For the initial 
value of M, we perform PCA analysis de Oliveira-Costa 
et al. (2008) to obtain the n c eigenvectors associated with 
the largest eigenvalues for the frequency-frequency co- 
variance matrix estimated by averaging over all pixels 
of data, and use those vectors as the n c columns of the 
mixing matrix M. After the mixing matrix is initialized, 
the power spectra Cj are chosen to be the corresponding 



diagonal elements in x(x)t, which is solved using Eq. 23 
and setting C -1 = 0 such that there is no prior informa¬ 
tion about C; Eq. 23 reduces to a standard map-making 
equation. After initializing M and Cj, we assume a flat 
power spectrum with amplitude comparable to the noise 
level as prior information about the HI signal. The sim¬ 
ulations show that the HI signal reconstruction is quite 
insensitive to the initial guess for its power spectrum as 
long as it is not unreasonably large. 

3. SIMULATIONS 

We perform simulations to generate dirty sky map data 
cubes. These data cubes include realistic models for 
the cosmological 21-cm signal, several diffuse foreground 
components, and instrumental noise. As this is just the 
first test of HIEMICA, for simplicity we assume that the 
primary beam is unity for all frequency channels and 
assume complete uv-coverage. We leave detailed investi¬ 
gations about more realistic observations to future work. 

Because an interferometer only measures temperature 
fluctuations around the mean and is insensitive to the 
mean value of brightness temperature, we set the mean 
of the 3D HI signal and the means of the foreground and 
noise maps at each frequency channel to zero. 

The 3D dirty sky map is simulated in a box with 64 3 
pixels (i.e., 64 pixels per side), covering a 30° x 30° sky 
patch and spanning over 780 — 880 MHz at intervals of 
1.59 MHz, corresponding to redshifts between 0.82 and 
0.61. In comoving coordinates, this box corresponds to 
about 1341 x 1341 x 606 Mpc 3 and the size of each pixel 
is about 21x21x9.5 Mpc 3 . The code was run using the 
best-fit cosmological parameters from the Planck mea¬ 
surements (Planck Collaboration et al. 2014b). 


3.1. HI Signal 


In cosmology the power spectrum is typically repre¬ 
sented in the (k x ,fc||) comoving coordinates. If the ob¬ 
served frequency band is small enough (i.e., probing a 
small range in redshifts) and one uses the flat-sky ap¬ 
proximation, there is a linear mapping between these 
variables: 


k ±D c (z) m _ c( 1 + z) 2 
27 r ’ ^ 2ttH 0 u 2 iE{z) 


(45) 


where E(z) = yAl m (l + z) 3 + Da, is 2 i is the rest fre¬ 
quency of the 21-cm line, D c is the transverse comoving 
distance, £ is the redshift of the observation, Hq is the 
Hubble parameter, c is the speed of light, and D m and Da 
are the normalized matter and dark energy density, re¬ 
spectively. The angular wavenumber k x and the parallel 
wavenumber k\\ are the components of the wavenumber 
k perpendicular and parallel to the line-of-sight direc¬ 
tion, respectively. Therefore inserting Eq. 45 into 13, we 
obtain the relation of the power spectrum defined under 
the different coordinates: 

Pm(k "’ fc|l) = loSfe PHl(U,7?) ' (46) 

The 21-cm brightness temperature and the correspond¬ 
ing 3D power spectrum can be written as: 


Pffl(fc) = (Tm(z)) 2 b 2 (k, z) D 2 (z)P cdm (k) 
. (1 + z) 2 Qb fm(z) 


T 21 (z) -0.084mK- 


E(z) 0.044 0.01 


(47) 

(48) 



A(mK 2 ) 

p 

a 

£ 

Aa 

extragalactic point sources 

57.0 

1.1 

2.07 

1.0 

0.2 

extragalactic free-free 

0.014 

1.0 

2.10 

35 

0.03 

Galactic synchrotron 

700 

2.4 

2.80 

4.0 

0.15 

Galactic free-free 

0.088 

3.0 

2.15 

35 

0.03 


TABLE 1 

Foreground model parameters for angular power 
SPECTRUM Ci { y , v ') USED IN EQ. 51. 


where b{k , z) is the bias parameter and P c dm is the cold 
dark matter power spectrum at the present day. D# is 
the baryon density fraction. D(z) is the growth factor for 
dark matter perturbations defined such that D(0) = 1. 
For the purpose of this paper, we assume 5 = 1 over 
redshift and scale. For simplicity, we also neglect the 
effects of redshift-space distortions caused by the pecu¬ 
liar velocities of HI clouds and galaxies in the HI power 
spectrum since the ICA-based approach is insensitive to 
the detailed shape of the power spectrum. The HI mass 
fraction is expected to increase with redshift and we as¬ 
sume a linear dependence: fm(z) = 0.008(1 + z). The 
matter power spectrum P c dm with the baryon acoustic 
oscillations (BAO) can be parametrized according to the 
simple empirical formula: 


Pcdm(^T5 &|| ) _ \ 

P re f {k _l ? &|| ) 


Akexp (—(k/r) a ) sin 


(W*^- 

\ V ^BAOt 


T 


*? ) 
^BAO|| / 


(49) 

(50) 


where k = yj k\ + fcjp the parameters A , a and r are 

adjusted according to the formula presented in Eisenstein 
& Hu (1998). P re f(fc) is the smooth “no-wiggles” power 
spectrum at 2 = 0, which can be computed from the 
fitting formula given by Eisenstein & Hu (1998). 

The parameters &baot an d &baoh are the sinusodial 
scales in the radial and transverse directions in k-space. 
We choose the following values for these parameters used 
in this paper: A = 1.0, r = 0.1 hMpc -1 , a = 1.4 and 
^BAOi — &BAO|| = 0.060 hMpc -1 . 


3.2. Foregrounds 

We model the foregrounds as isotropic random Gaus¬ 
sian fields described by angular power spectra 
based on Santos et al. (2005). In this paper, we as¬ 
sume that bright resolved point sources have been re¬ 
moved accurately and consider the dominant four dif¬ 
fuse components: Galactic synchrotron emission, Galac¬ 
tic and extra-galactic free-free emission, and extragalac- 
tic radio point sources. The angular power spectrum of 
each source takes the generic form: 


C f (l, z/, z/) = A 




2£ 2 ) 


(51) 

where Vf is the reference frequency with v$ — 130 MHz. 
We list the parameters of the foreground models used 
in this paper in Tab. 1. Based on such models, Fig. 1 
shows the angular power spectrum of each foreground 
contribution for one realization. The corresponding maps 
are shown in Fig. 2. 
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Following Davies et al. (2006), the frequency index a 
of real-world foregrounds varies slightly across the sky. 
For example, in the case of synchrotron emission, the 
direction-dependent spectral index reflects variations of 
the relativistic electrons density and Galactic magnetic 
field. In this paper, for each foreground component, we 
assume the indices in different directions to be Gaussian 
distributed with a mean of a and an rms of A a as shown 
in Tab. 1, consistent with model parameters chosen in Liu 
& Tegmark (2012). 


3.3. Noise 


For each frequency channel, we assume Gaussian noise 
in the m;-plane with zero mean and that the noise maps 
at two different frequencies are uncorrelated. We make 
noise realizations in the w-plane for a given noise power 
spectrum for each frequency. The thermal noise of the 
measurement in units of brightness temperature is given 
by (McQuinn et al. 2006; Ansari et al. 2012a) 


P N (u,0 = (|«(u,0| 2 ) = 


2 rp2 
± sys 

AvAtny 


(52) 


~ 1.9 x 10 -7 mK 2 (l + z v )‘ 


(Tsys V 
V50Ky 


10m \ /10 \ /lMHz\ / 30 days \ 

\m) v^r ) ) 


where z v is the corresponding redshift of the frequency 
z/, T sys is the system temperature, n 5 is the total number 
of redundant baselines within that uj . c pixel with integra¬ 
tion time At for each baseline, Av is the width of the 
frequency channel, and A e is the effective area of each 
individual antenna in the array. 

In order to keep the results as general as possible, we 
do not consider a specific configuration of the array; the 
main goal in this paper is not to discuss instrumental 
effects in 21 -cm power spectrum measurements. Instead, 
we consider two simple scenarios in our simulation to 
illustrate the effectiveness of the presented foreground¬ 
cleaning approach. One scenario assumes a currently 
achievable noise level of — 1 x 10 “ 3 mK for each 
pixel of the m;-plane, and the other one assumes a noise 
level of about 2.2 x 10 -4 mK for future measurements. 
The former has an averaged signal-to-noise ratio over the 
whole data cube about 1 while the latter is about 5. 


4. APPLICATIONS TO SIMULATED DATA 

In this Section we show the results obtained by ap¬ 
plying the HIEMICA algorithm to the simulated data. 
There are two main findings, namely the map reconstruc¬ 
tion and the power spectrum recovery. As the fluctua¬ 
tions of the cosmological 21 -cm signal are expected to be 
isotropic, we focus on the recovery of the averaged 3D 
power spectrum in spherical shells of constant \k\. How¬ 
ever, redshift-space distortions will slightly break spher¬ 
ical symmetry, and we will investigate the consequences 
in a future work. The estimated power spectra are ob¬ 
tained by averaging over the results from 10 independent 
realizations of the simulated data cubes and the associ¬ 
ated statistical errors are obtained from their dispersions. 
Each data cube is for a fixed sky patch that includes the 
same simulated foregrounds combined with independent 
realizations of the instrumental noise and the HI signal. 



Fig. 1.— The angular power spectra derived from the simu¬ 
lated maps of the 21-cm signal (black-thick) and foregrounds at 
z = 0.71 (v = 830 MHz), with a bandwidth of Av = 1.59 MHz. 
The foreground components include Galactic synchrotron radiation 
(red-solid), extragalactic point sources (blue-dashed), and Galac¬ 
tic free-free (green-dotted) and extragalactic free-free (brown long- 
dashed). 


4.1. The Independent Components 

Fig. 3 shows the four Wiener-filtered ICs at v — 830 
MHz found by HIEMICA. The Wiener-filtered compo¬ 
nents are calculated by Eq. 23, and, the corresponding 
map of the i-th component at the frequency bin v is ob¬ 
tained by multiplying such component by the (z/, i)-th 
entry of the mixing matrix, M v As seen from Figs. 2 , 3 
and 4, we find that only the first component is compatible 
with the input Galactic synchrotron contribution, while 
the others have no corresponding physical foregrounds 
and have unphysical spectral and spatial behaviours. 
These unphysical behaviours are due essentially to the 
fact that the realistic astrophysical components have sig¬ 
nificant cross-correlations while our independent compo¬ 
nent analysis assumes mutual independence between the 
source signals. Thus each separated component is just 
a mixture of all the source signals, but the sum of these 
ICs is essentially same as the total astrophysical signal, 
as seen in Figs. 5(a) and 5(b). Meanwhile, the Wiener- 
filtered maps of the 21-cm signal (bottom panel of Fig. 5) 
show the encouraging result that, the 21 -cm signal, even 
in the case of S/N=l, is nearly perfectly recovered across 
the map by HIEMICA. 

There is a slight difference between the low and high 
S/N cases (Figs. 5(d) and 5(e)) at small scales, but no 
apparent difference at large scales is found. The recov¬ 
ered amplitudes of fluctuations for S/N=l are smaller 
than those for S/N=5 by several tens of percent, which 
is essentially due to noise suppression in Wiener-filtered 
maps that is oc S/(S+N), where S/N tends to be low at 
high-A; modes. However, the large-scale fluctuations of 
the 21 -cm signal in both cases are much larger than the 
noise amplitudes, resulting in almost identical large-scale 
fluctuations. 
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Fig. 2 
value (i.e 
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(c) Galactic free-free emission 


(d) Extragalactic free-free emission 


.— The simulated diffuse foreground components for a 30° X 30° patch of sky at i/ = 830 MHz. We have subtracted the mean 
i. DC mode) to reflect an interferometric observation. 



(a) The first foreground component 

. " I 


r 1 ^ 

(b) The second foreground component 
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(c) The third foreground component 
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(d) The fourth foreground component 


Fig. 3. — Wiener-filtered maps of independent components (ICs) reconstructed by applying HIEMICA to the simulated data cube with 
S/N=5 for the same sky patch and frequency as in Fig. 2. Notice that we do not expect the recovered maps to explicitly correspond to the 
input maps since these ICs are assumed to be mutually independent while the true physical components may have correlations. 
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frequency (MHz) 


Fig. 4.— Top panel: HIEMICA-derived coefficients of each col¬ 
umn of the mixing matrix, representing the frequency dependence 
of the ICs, where the red-solid line is for the first component, blue- 
dotted for the second, green-dashed for the third and brown long- 
dashed for the fourth. When applying HIEMICA to simulated data 
cubes, here we have assumed n c =4 and S/N=5. Middle panel: 
the rms brightness temperatures of ICs calculated from the derived 
mixing matrix and their angular power spectra. Bottom panel: 
the rms brightness temperatures of the input physical foreground 
components including Galactic synchrotron radiation (red-solid), 
extragalactic point sources (blue-dashed), and Galactic free-free 
(green-dotted) and extragalactic free-free (brown long-dashed). As 
mentioned in Fig. 3, the temperature fluctuations of the ICs are 
not necessarily the same as the physical components. 

4.2. Determination of the Number of ICs 

When analyzing data with HIEMICA, we have to 
choose the number of Independent foreground Compo¬ 
nents (ICs) to be estimated. Our statistical framework 
for independent component analysis can provide a rigor¬ 
ous determination of this number by using an empirical 
statistical approach, the Likelihood Ratio Test (LRT). 
The LRT provides an objective criterion for selecting 
between possible models by using the ratio of the like¬ 
lihoods on the data. Although in practice evaluating 
likelihood functions is a hard task, we can instead ap¬ 
ply HIEMICA several times with successively increasing 
N c until the recovered power spectra are essentially un¬ 
changed when more components are added into the anal¬ 
ysis. 

The LRT statistic is given by LRT = 2 In C s — 21n£ c , 
in which In C s and In C c denote the logs of the likelihood 
functions for a relatively simpler model s (so-called null 
hypothesis) with fewer parameters and a more complex 
model c (alternative hypothesis), respectively. Asymp¬ 
totically, the LRT statistic follows a y 2 distribution with 
degrees of freedom equal to the difference in the number 
of free parameters between the two models. In our case, 
in analysis of the data with 64 frequency channels and 
30 A bins, adding an additional foreground component 
would increase the degrees of freedom by 64 + 30 = 94. 

The comparison of the fitting model between the mod¬ 
els for n c = 1 and 2 (i.e., labelled as 1 -A 2 ICs in Tab. 2 ) 


and n c = 2 and 3 (as 2 -a 3 ICs) shows that, from the 
values of LRT, InL increases significantly as the num¬ 
ber of ICs increased up to 3. The significance values 
(p — 0) for S/N of 1 and 5 indicate that only one or two 
foreground components would result in an unsuccessful 
fit and the data strongly prefer three foreground com¬ 
ponents. Moreover, especially in the case of high S/N, 
there is no statistically significant difference when using 
more components, given the larger p values (p > 0 . 01 ). 
Here all the values of LRT and p are the median over 10 
independent simulations. Therefore, it is recommended 
to use three components in the reconstructed foreground 
model for our specific simulation parameters in Tab. 1. 
It is worth noticing that, as mentioned by Alonso et al. 
(2015), the optimal number of ICs in fact strongly de¬ 
pends on the spectral smoothness of true foregrounds, 
characterized by a frequency correlation length £ as de¬ 
fined in Eq. 51. In fact, a longer coherence length implies 
a smoother frequency spectrum for a physical foreground 
component. Consequently, a smaller value of £ may re¬ 
quire more ICs to successfully model such physical com¬ 
ponents and remove them accordingly. 

4.3. 3D Power Spectrum Results 

Fig. 6 shows the rapid convergence of the resulting HI 
power spectrum when successively increasing the number 
of ICs. It can be seen that the recovered power spectrum 
for IC number of 2 is slightly different from those for IC 
number of 3 and 4, but the 3 and 4 components result 
in almost the same power spectrum, which is consistent 
with the varying trend in the LRT. It can also be con¬ 
firmed by comparing the reconstructed Wiener-filtered 
ICs in Fig 3. As the amplitude of fluctuations of the sec¬ 
ond component (see Fig. 3(b)) is indeed about 20 times 
larger than the input 21-cm signal (see Fig. 5(c)), the 
removal of the second component is therefore necessary. 
Furthermore, since the averaged amplitude of fluctua¬ 
tions of the third component (see Figs. 3(c) and 4) is 
about 10 times smaller than the 21 -cm signal, the effects 
of such component in signal recovery are expected to be 
less important but can still lead to notable improvements 
at large scales as the corresponding 3D Fourier modes 
are not significantly smaller than those of the HI signal. 
Moreover, the amplitude of fluctuations of the fourth 
component (see Fig. 3(d)) is about two times smaller 
than the third one, leading to rather small effects in sig¬ 
nal reconstruction. 

The comparison between the estimated signal power 
spectra for S/N=l and 5 in Fig. 6 , we note that, for 
k < 0.2 Mpc -1 over which the averaged S/N> 1 for 
both cases, the impact of the noises at these scales tends 
to be negligibly small since the recovered power spectra 
follow the true spectrum very closely and there are no 
apparent differences between the two cases. On the other 
hand, the recovered power spectrum for S/N=l begins 
gradually to overestimate the true one at k > 0.2 Mpc -1 
by up to 12 %, but for S/N=5 the recovered spectrum is 
almost same as the true one with only 2% deviation. The 
additional power in the former case is primarily because 
the large noise level causes a slight underestimate of the 
true foreground contributions. The “excess power” is 
barely significant; the simulation truth is still contained 
in the 1 -cr error bars. 

In Figs. 6 (c) and 6 (d), we zoom in on the region with 
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(a) Input total foregrounds 


(b) Recovered total foregrounds 
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(d) Recovered 21-cm signal (S/N=l) 


(e) Recovered 21-cm signal (S/N=5) 


Fig. 5.— Same as Fig. 3, but for the total foregrounds and the 21-cm signal to clearly illustrate that a successful separation was 
achieved by the HIEMICA-cleaning process. There is an underestimate of the 21-cm signal in the case of S/N=l, which is caused by noise 
contamination as a Wiener-filtered map oc S/(S + N). An almost perfect recovery occurs over all scales for S/N=5. 


BAO features to illustrate clearly the dependence of the 
power spectrum estimation on the number of ICs as well 
as on the noise amplitude. We can see that too few com¬ 
ponents, such as n c = 2 in both cases of S/N=l and 5, re¬ 
sults in an overestimate on the power spectrum at scales 
of0.1<&<0.2 Mpc -1 , since those two components can 
not completely describe the spectral and spatial proper¬ 
ties of the foreground. As the number of ICs increases to 
3 and 4, the resulting power spectra decrease and both 
rapidly converge to the true values. As the S/N drops 
down by a factor of 5, the measurement errors become 
larger since such errors can be approximately estimated 
by a{k) oc y-P hiO) +-Pn ( k ). More importantly, one can 
see that, if n c = 3 or 4, the recovered HI spectrum is well 
within the 2-cr error bars of the estimates. 

Notice that we have applied a correction to convert 
the prediction intervals on the underlying power spec¬ 
trum into error bars that approximate Bayesian power 
spectrum inference in the limit of low noise and small 
numbers of modes. The resulting upper and lower error 
bars are asymmetric. The detailed analytical estimations 
of such correction are presented in the Appendix. As a 
consequence, compared with the uncorrected errors, the 
upper error bars increased by about 60%, 40% and 20% 
respectively in the lowest three /c-bins and the associated 
lower error bars decreased by about 40%, 15%, 10%. The 
differences are less than a few percent in the higher k- 
bins due to the large number of modes contained within 
those bins. 

We can also see that the recovered cosmological signal 
decreases approximately monotonically with increasing 
number of ICs, especially for the low -k modes (k < 0.1 
Mpc -1 ). This result may be interpreted as follows: (1) 


since each recovered foreground component is a mixture 
of all the source signals, including the cosmological sig¬ 
nal, a small amount of cosmological signal can leak into 
the reconstructed ICs, strongly biasing and reducing the 
amplitude of the derived cosmological signal; (2) the an¬ 
gular power spectrum of the dominant foreground com¬ 
ponent, Galactic synchrotron emission, has the largest 
amplitude for small i (see Fig. 1), resulting in a strong 
contamination in the cosmological signal at large scales in 
the transverse direction while increasing the risk of leak¬ 
age; (3) the performance of the ICs-based reconstruction 
also depends on the number of measured Fourier modes 
in each &-bin, and, in the simulated data the lowest and 
the second lowest &-bins contain about 500 and 50 times 
fewer modes than those in the high /c-bins, respectively. 

As discussed before, using the LRT, the three- 
component foreground model used in HIEMICA gives 
rise to the successful recovery of the HI power spec¬ 
trum. In order to evaluate the recovery performance, 
we present the derived power spectrum in Fig. 7 in com¬ 
parison with that from the existing commonly-used PC A 
technique. We can see that, after applying PC A to sub¬ 
tract the three dominant foreground components from 
the data cube, the residual apparently underestimates 
the HI power spectrum at large scales, which seems to be 
an over subtraction of the foregrounds. Compared with 
the input ones, the amplitudes in the first two /c-bins are 
smaller by factors of 14.1 and 3.7 if S/N=l, and by fac¬ 
tors of 16.1 and 4.1 if S/N=5, respectively, whereas the 
ratios of HIEMICA-derived power spectra and the input 
ones in those two bins are only about 91 — 92% ifS/N=l 
and 91 — 94% if S/N=5, respectively. In addition, the 
recovered amplitudes in those two /c-bins are consistent 
with the input values within 2-cr. The HIEMICA method 
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Fig. 6.— Dependence of the 21-cm signal recovery on the number of independent components, for the data with S/N=l (left) and 5 
(right). Upper panels: We show the spherically averaged three-dimensional power spectra of the simulated 21-cm signal (black), noise 
(black-dotted), reconstructed 21-cm signal for the HIEMICA algorithm run with the assumption of n c = 2 (red-dotted), 3 (green-dashed) 
and 4 (blue long-dashed), respectively. Vertical bars indicate the l-cr errors estimated from 10 realizations and horizontal bars the bin-width 
of A k = 0.0129. The statistical uncertainties are typically smaller than the symbol sizes and mostly invisible. Lower panels: Same as 
upper panels, but showing the HI power spectrum divided by a smoothed spectrum without baryon acoustic oscillations to single out those 
oscillations. 
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TABLE 2 

Comparison of models by the Likelihood Ratio Test (LRT) for the number of independent foreground components. 


Model 

A dof 

LRT (S/N=l) 

LRT (S/N=5) 

p -value (S/N=l) 

p -value (S/N=5) 

1 -A 2 ICs 

94 

17558 

20872 

ot 

ot 

2 -A 3 ICs 

94 

196 

280 

< io- 6 t 

< io- 6 t 

3 —► 4 ICs 

94 

120 

125 

0.036 

0.018 

4 ^ 5 ICs 

94 

96 

108 

0.42 

0.15 


tp < 0.01, typically used to assess the significance of LRT. 




k(Mpc _1 ) 


k(Mpc -1 ) 


(a) S/N=l 


(b) S/N=5 


Fig. 7.— Same as Fig. 6, but for comparison of 21-cm power spectrum recovery based on the PC A technique (magenta-dotted) and the 
HIEMICA approach (gren long-dashed), for the data with S/N=l (left) and S/N=5 (right). The spherically averaged three-dimensional 
power spectra of the simulated 21-cm signal(black-solid), noise (black-dotted), HIEMICA-derived (assumed n c — 3) 21-cm signal and PCA- 
derived residuals from projecting out the first three dominant eigen-components based on the frequency-frequency covariance of data. 
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thus does a much better job at foreground removal and 
cosmological signal recovery at large scales. 

More promisingly, we find that the HIEMICA ap¬ 
proach provides an unbiased estimate of HI power spec¬ 
trum for the high-A: modes with & > 0.1 Mpc -1 where 
S/N tends to be low, whereas the power spectrum of 
the PCA-derived residual in the noise-dominant regime 
seems to deviate strongly from the true signal, resulting 
in an overestimate of HI measurements. 

Since the isotropic HI signal has the same statis¬ 
tical property as that of the assumed instrumental 
noise, commonly-used ICA-based approaches have trou¬ 
ble breaking the degeneracy, leaving residuals consisting 
of the reconstructed 21-cm signal, noise and fitting er¬ 
rors. As proposed in Chapman et al. (2012); Bonaldi 
& Brown (2015), after foreground cleaning by CCA or 
FASTICA, one has to manually subtract the noise power 
spectrum from that of the residual data cube and ob¬ 
tain a comparable amplitude with the true signal. How¬ 
ever, those previous studies show that such a scheme 
can cause an overfitting problem and lead to some nega¬ 
tive or zero-closed amplitudes of the residual power spec¬ 
trum in noise-dominated k- modes, implying that a leak¬ 
age of HI signal into the subtracted foreground compo¬ 
nents is likely to be serious in regions with low S/N. 
Some other schemes such as weighting maps with inverse 
noise variance to build the frequency covariance used in 
PCA (Alonso et al. 2015) could mitigate such noise con¬ 
tamination. 

In contrast to the above-mentioned standard PCA- and 
ICA-based approaches, the proposed HIEMICA method 
in fact provides an unbiased and efficient estimate of 
Gaussian components, derived from the maximum like¬ 
lihood principle. If the noise covariance matrix N is ex¬ 
actly known, this method can optimally estimate the un¬ 
derlying 21-cm signal and foreground covariance matri¬ 
ces. It avoids any artificial effects that can occur when 
manually subtracting the noise power spectrum from the 
residuals after the foreground cleaning processes. Our 
simulations do confirm this, showing that the ensemble 
averaged recovered HI power spectrum converges to the 
true value even in low S/N regions where the noise is 
about an order of magnitude higher. 

We note that our semi-blind approach requires exact 
knowledge of the noise in order to infer the 21-cm power 
spectrum accurately; an incorrect noise covariance ma¬ 
trix can bias the signal estimate. However, a relatively 
lower accuracy in modelling the noise covariance matrix 
may be sufficient for the power spectrum estimation if 
the uncertainty in the noise level is not greater than the 
signal. In general, the corresponding impacts on the es¬ 
timated parameters can be evaluated by a Monte-Carlo 
approach. But qualitatively, the level of the uncertainty 
in the estimated 21-cm power spectrum is about of the 
same order of magnitude as that of the noise. A sim¬ 
ple way to understand this is to consider that in the 
absence of instrumental effects, if the noise is assumed 
to be uncorrelated, the value of Phi which maximizes 
the likelihood function can be approximately estimated 
by Phi ~ Pd — Pf — Pn where Pd, Pf and Pn are the 
power spectra of the data points, the foreground and 
noise, respectively. As the total amplitude of the fore¬ 
ground fluctuations is significantly larger than that of 
the noise fluctuations, one can expect that the estimate 


of the foreground power spectrum roughly remains un¬ 
changed when slightly varying the input noise amplitude. 
Thus assuming an input noise with an amplitude either 
greater or smaller than the true one would lead to ei¬ 
ther an underestimate or an overestimate of the 21-cm 
power spectrum, i.e., APhi ~ — AP^. In order to check 
the above estimate, we test the effect of using an input 
noise power spectrum with 10 — 50% larger or smaller 
amplitudes than the true one and find that the resulting 
changes APhi are indeed consistent with our expected 
variations (—APn) within a factor of 2 over all scales. 
In addition, incorrect noise estimates would also bias 
the estimate for the PCA method, since the debiasing 
Monte Carlo simulations would give the wrong results if 
the noise level were misspecified. 

In more realistic cases, an experiment may exhibit a 
low-level of cross-correlated Gaussian or non-Gaussian 
noise, both of which are unknown. In the former case, 
similar to the above qualitative estimation, we expect 
that the small level of correlation will slightly bias the 
signal estimate but not significantly affect it. This is 
because, as mentioned above, (1) the HI power spectrum 
is approximately determined by the difference among the 
diagonal components of the data point, the foreground, 
and noise model, not by their off-diagonal ones, and (2) 
the estimate of the foreground is unchanged or insensitive 
to the noise model. In the latter case, as the variance of 
the non-Gaussian noise is non-negative, the HI power 
spectrum is thus likely to be slightly overestimated. We 
will investigate the impact of noise misspecification in a 
future work. 

Also, this blind component separation algorithm makes 
no assumptions about the spectral information of the 
foreground and can naturally include all instrumental ef¬ 
fects so as to be able to coherently estimate all source 
components and the associated errors. More detailed 
comparisons between HIEMICA and other approaches 
for data in the presence of realistic instrumental effects 
will be made in future work. 

5. CONCLUSIONS 

In this paper we present a non-parametric source sepa¬ 
ration algorithm, HIEMICA, which is an extension to the 
3D version of the SMICA method proposed by Snoussi 
et al. (2002); Delabrouille et al. (2003) for source identi¬ 
fication in noisy mixtures of CMB maps. The HIEMICA 
algorithm is a fully Bayesian framework to infer the 3D 
power spectrum and maps of the underlying HI signal 
and the spatial power spectra and the frequency depen¬ 
dence (i.e. mixing matrix) of uncorrelated foreground 
components. We adapt the EM algorithm to efficiently 
maximize the likelihood function for unknown parameter 
estimates. As the statistical properties of the cosmolog¬ 
ical signal are significantly different from those of each 
astrophysical source, the spectral matching method in 
principle can blindly separate the HI signal from highly 
foreground-contaminated maps. 

The simulations show that HIEMICA is able to suc¬ 
cessfully reconstruct the HI 3D power spectrum across all 
scales and is much more robust than the PCA approach, 
which removes some of the 21-cm signal at large scales 
(k < 0.1 Mpc -1 ) and overestimates the HI power spec¬ 
trum in the noise-dominated regions where noise leaks 
into the estimated signal. To evaluate the impact of 
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the number of ICs in signal recovery we adapt the LRT 
method to rigorously assess the likelihood fit so as to de¬ 
termine the optimal number of foreground components 
used in the HIEMICA analysis. We find for the number 
of ICs higher than 3, a number comparable to that of the 
simulated astrophysical components, the reconstruction 
does not gain significant improvements and converges to¬ 
wards a stable result consistent with the input HI power 
spectrum. 

Although our results are quite promising, the simu¬ 
lated observations in this study are idealized, with no 
instrumental effects. As instrument effects such as the 
frequency-dependent uv sampling and primary beam can 
significantly complicate the foreground removal and bias 
the signal estimate, it is important to test HIEMICA in 


more detail in the context of specific experiment set-ups 
in future work. 
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APPENDIX 


We here present a derivation of an approximate expression for the posterior distribution of the signal power spectrum 
in terms of the sampling distribution of the estimator so as to calculate Bayesian credible intervals of the inferred 
parameter. The sampling distribution of the estimator in our case can be derived from a Monte Carlo (MC) distribution 
of the Maximum Likelihood Estimator (MLE) of the signal power spectrum P. 

The power spectrum is simply the variance of the Fourier mode coefficients. For simplicity, if we assume that the 
all the Fourier modes are independent and the variance of all the Fourier mode coefficients within the bin is the same, 
the posterior distribution of P given data di for i = 1, • • • , n can be expressed in an ideal noise-free case as 

p{P\d) (x C{d\P)p{P) (x p-%e-^?=i d ^ p p-\ (1) 

where the non-informative Jeffreys prior (Jeffreys 1961), p(P) = P _1 , has been applied on the power spectrum. As 
we know, the MLE P = ^ * s unbiased over data sets drawn from 

(P) = P ( 2 ) 


and has variance 

((A P) 2 ) = ^P 2 . (3) 

If we want to construct an approximate Bayesian posterior from MC samples from the sampling density of the MLE 
P and the effective number of degrees of freedom is unknown, then we can determine it from the mean and variance 
of P using 


h = 2 



(4) 


Assuming that noise is subdominant, inserting Eq. 4 into Eq. 1, we can use P and the effective number of degrees 
n of freedom to approximate the posterior for P as 

p(P\d)9Se~^ p /P^ , 


(5) 
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from which the asymmetric credible interval can be derived. 



